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^ ! Abstract 

o 

' We consider the problem of rate/distortion with side information available only at the decoder For 

Q ' the case of jointly-Gaussian source X and side information Y, and mean-squared error distortion, 

I Mfyner proved in 1976 that the rate/distortion function for this problem is identical to the conditional 

rate/distortion function assuming the side information Y is available at the encoder In this 

paper we construct a structured class of asymptotically optimal quantizers for this problem: under the 
assumption of high correlation between source X and side information Y, we show there exist quantizers 
, within our class whose performance comes arbitrarily close to Wyner's bound. As an application 

illustrating the relevance of the high-correlation asymptotics, we also explore the use of these quantizers 
^ ' in the context of a problem of data compression for sensor networks, in a setup involving a large number 

' of devices collecting highly correlated measurements within a confined area. An important feature of 

' our formulation is that, although the per-node throughput of the network tends to zero as network size 

increases, so does the amount of information generated by each transmitter This is a situation likely 
■ to be encountered often in practice, which allows us to cast under new — and more "optimistic" — light 

^ ' some negative results on the transport capacity of large-scale wireless networks. 

> 

X 
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I. Introduction 

A. Large-Scale Wireless Sensor Networks 

Wireless networks span a wide spectrum in terms of their functionality (i.e., what they are used for), 
organization (i.e., how the different components are assembled to form a complete working system), and 
the technologies used to build them. A long-term project currently under way at Cornell deals with the 
design and prototyping of networks with the following defining characteristics: 

• The nodes operate under severe power constraints, support relatively large data transfer rates, and 
their number and density is large. 

• Once nodes are deployed, their mobility is very limited (if there is any at all). Instead, the main 
source of uncontrolled dynamics in the network is the temporary failure of individual nodes: this 
will typically happen either due to exhaustion of the power source (and for the duration of the 
"refueling" period), or due to variations in the wireless medium. 

In our setup of interest, the network is made up of devices whose functionality is essentially that of a 
traditional Cisco router, with the addition that they communicate over a wireless channel, their size is 
many orders of magnitude smaller, and they may come equipped with sensors that generate information 
locally as well. Such networks would prove extremely useful in a variety of very relevant scenarios, 
such as disaster relief operations, military and surveillance applications, cell-size reduction in cellular 
networks, environmental monitoring, etc. 

The development of a working network of this kind requires solutions to a number of technical chal- 
lenges (e.g., routing, flow control, source and channel coding, power control, modem design, hardware, 
etc.). Among all these, of particular interest in this paper is the problem of source coding, in a scenario 
in which the data collected by a large number of sensors is highly correlated. When network nodes are 
coupled with devices that sense a spatial process at different locations (e.g., concentration of ozone in 
the atmosphere, spread of a pathogen/pollutant agent, temperature of a material, etc.), the measurements 
collected by each node will not be independent in general, but instead will be correlated, with a correlation 
structure determined by the corresponding fluid dynamics equations. Furthermore, the higher the density 
of nodes in the network, the higher the correlation in the measurements will be. Therefore, appropriate 
source coding capable of removing these dependencies has the potential to significantly reduce the number 
of bits to be transmitted (and therefore the consumption of scarce power resources), when compared to 
a coding strategy that treats all measurements as being independently generated. 
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The use of standard and well understood source coding techniques is not appropriate in the context of 
highly correlated sources: the use of classical source codes to remove redundancy in the measurements 
collected by different sensors requires that data be pooled at a common node prior to transmission. But 
this pooling action consumes valuable communication resources itself, thus defeating the very same goal it 
tries to achieve (communication efficiency). Therefore, distributed source coding techniques are required, 
i.e., codes capable of removing correlation among measurements even in the presence of uncertainty 
about the exact value measured at remote locations. To this end, we define a simple abstraction that 
captures the essential properties of this problem. First, we consider the source of information to be a 
random process (^s)sg[o,i]> defined over a bounded set, and with continuous sample paths — continuity 
is one simple way of capturing into our model the notion of correlation among measurements increasing 
with the number of nodes in a confined area. This process is observed by a finite number of sensors, and 
these observations are to be communicated over a wireless network, as illustrated in Fig. 1. 




Central Docodor 




Q Sourco Estimato 



Fig. 1. Network model. There are three types of nodes: sources, relays, and destination nodes, with n nodes of each type. 
There is a source (a random process whose statistics are known by all sources), from which each of the source nodes collects a 
sample. These samples are encoded by each source node without knowledge of the samples collected by other nodes, fed into 
the network, and each sent to a destination node. Finally, these destination nodes pool all their information at a central location, 
at which a decoder forms an estimate of the entire sample path, based on the data available from all sensors. A key aspect of 
our problem formulation is that each source node has to decide what information to send to the central decoder without explicit 
knowledge of the information available at other nodes — only with knowledge of the statistics of that correlated data. 
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An important aspect of this problem setup is the fact that, as we increase the number of source nodes, 
the amount of information contained in each sample tends to zero — because the source is continous, two 
nearby samples are almost the same. And we know from recent work on the transport capacity of one 
class of wireless networks that, again for large networks, the per-node throughput of networks in this 
class also tends to zero [22]. Therefore, provided that the rate at which information contained in each 
sample decays at least as fast as the throughput of the network, appropriate source coding techniques 
should enable an accurate reconstruction of the source at the central decoder of Fig. 1. A study of the 
resulting source coding problem in the context of these networks is the central subject of this paper. 

B. Rate Distortion with Side Information 

1) Problem Statement: Let {{Xn,Yn)}'^^i be a sequence of independent drawings of a pair of 
dependent random variables X and Y, and let D{x,x) denote a single-letter distortion measure. The 
problem of rate distortion with side information at the decoder asks the question of how many bits 
are required to encode the sequence {Xn} under the constraint that ED{x,x) < d, assuming the side 
information {Yn} is available to the decoder but not to the encoder [15, Ch. 14.9]. This problem, first 
considered by Wyner and Ziv in [56], is a special case of the general problem of coding correlated 
information sources considered by Slepian and Wolf [44], in that one of the sources is available 
uncoded at the decoder. But it also generalizes the setup of [44], in that coding is with respect to a 
fidelity criterion rather than noiseless. One important motivation for us to consider this problem is the 
fact that good quantizers with side information will be used in the proof of scalability of a large sensor 
network. 

In [55], [56], Wyner and Ziv derive the rate/distortion function R*{d) for this problem, for general 
sources and general (single letter) distortion metrics. In this work however we restrict our attention only to 
Gaussian sources, and mean squared error (MSB) distortion. This case is of special interest because, under 
these conditions, it happens that R*{d) = Rx\Y{d), the conditional rate/distortion function assuming Y is 
available at the encoder [55], [56]. We are intrigued by the fact that there exist coding methods which can 
perform as well as if they had access to the side information at the encoder, even though they don't. One 
goal pursued in this paper then is the construction a family of quantizers which realizes these promised 
gains. 

2) Lattice Quantization with Side Information: High-rate quantization theory provides much of the 
motivation to consider lattices [20]. Under an assumption of fine quantization, the performance of an 
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n-dimensional quantizer A whose Voronoi cells are all congruent to a polytope P is given by 

d = G{P) • e-^{^(A'P^)-'^(P^)), (1) 

where px is the joint source distribution in n dimensions, H is the discrete entropy induced on the 
codebook A by quantization of the source px, h is the differential entropy, and 

- fo llx — xlP dx 
Up dx) " 

is the normalized second moment of P (using MSE as a distortion measure) [18], [58]. 

In the problem of rate distortion with side information, for Gaussian sources and MSE distortion, the 
goal is to attain a distortion value d using Rx\Y{d) < Rx{d) nats/sample. In (1) this means that, at fixed 
bit rate Rq, we want to design quantizers that achieve distortion 

when coding X, where Cn < G{P) is the coefficient of quantization in n dimensions [18]. But since we 
do not have access to Y (we only know Px|y)> using classical quantizers we can only attain a distortion 
value 

(because h{X\Y) < h{X)), or equivalently, we need to use some extra rate p ^ Rx — Rx\Y such that 

do - Cn-e-^^''^^"+P^-^^P''^\ 

What makes this problem interesting is that we are only allowed to use Rq nats/sample, not Rq + p. One 
way to do that has been proposed by Shamai, Verdu and Zamir in [42], [60], which consists of: (a) taking 
a codebook with roughly e"(^«+^) codewords and distortion do, (b) partitioning this codebook into e"^" 
sets of size e^^ each, (c) encoding only enough information to identify each one of the e"^° sets, and 
(d) using the side information Y to discriminate among the e'^f codewords collapsed into each set. One 
of our motivations for considering lattice codes is the fact that their structure makes it particularly easy 
to express these partitioning operations described in [42]. 

We should also mention that another reason to consider lattices is our wish to answer a challenge posed 
by Zamir and Shamai in [60]. They present an encoding procedure very closely related to the one we 
propose here, they argue the existence of good lattices to use with that procedure, they study their distor- 
tion performance, but they do not present any examples of concrete constructions: their paper concludes 
by saying that (sic) "beyond the question of existence, it would be nice to find specific constructions of 
good nested codes" . Finding those specific constructions is one of the original contributions in this work. 
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C. Related Work 

Note: this section contains relevant related work as of Fall 2004. 

1) Codes and Quantizers: The design of quantizers for the problem of rate distortion with side 
information was considered recently by Shamai, Verdii and Zamir, where they present design criteria 
for two different cases: Bemoulh sources with Hamming metric, and jointly Gaussian sources with mean 
squared error metric [42], [60]. The key contribution presented in that work is a constructive mechanism 
for, given a codebook, using the side information at the decoder to reduce the amount of information that 
needs to be encoded to identify codewords, while at the same time achieving essentially the distortion 
of the given codebook. That work provided much inspiration for our work on the design of lattice codes 
presented in this paper. 

Other work on code constructions includes the application of similar codebook partitioning ideas in 
the context of trellis codes [35], a preliminary version of this work [39], generahzations to the case when 
the side information may be coded as well [36], [62], constructions based on LDPC codes [1], [31], [48], 
and other code constructions [29], [37]. 

2) Information-Theoretic Performance Bounds: Whereas there has been some interest in recent times 
on the more practical aspects of these problems, a significant amount of work on related topics had 
already been done before in the context of multiuser information theory. Specifically on the problem of 
rate/distortion with side information, besides the above mentioned work of Wyner and Ziv [55], [56], 
Kaspi and Berger present a summary of known results and a number of new results (as of 1982) in [25], 
leaving only a couple of special cases still open. Heegard and Berger further generalize to the case 
when there is uncertainty on whether the side information is available at the decoder or not [24]. For 
an arbitrary pair of sources, Zamir gives bounds on how far away the conditional rate/distortion function 
and the Wyner-Ziv rate/distortion function can be from each other [59]. 

Closely related to the problem of rate/distortion with side information is that of Noiseless Coding of 
Distributed Correlated Sources. Slepian and Wolf formulate this problem, and determine the minimum 
number of bits per symbol required to encode two correlated sequences {Xn} and {Yn} separately, such 
that they can be faithfully reproduced by a centralized decoder, under the assumption that {(X„, Yn)}'^^i 
is i.i.d. [44]. Cover then gives a simpler proof of the same result, which also generalizes to arbitrary 
ergodic processes, countably infinite alphabets, and arbitrary number of correlated sources [13]. Wyner 
presents an information theoretic characterization of the minimum rates required for faithful reproduction 
in a general network with side information [54]. Barros and Servetto consider the Slepian-Wolf problem 
in an arbitrary network setup with noisy point-to-point links [4]. 
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A long-standing open problem in network information theory is the characterization of the rate- 
distortion region for the Multiterminal Source Coding problem, which is basically the Slepian-Wolf 
problem, but in which a non-zero distortion is allowed in the encoding of both sources. The most 
significant contribution to this date can be found in Tung's doctoral dissertation [50]. Berger developed 
some useful notes for a tutorial lecture on this and related problems [5]. 

Yet another closely related problem is the CEO Problem. In this version, multiple sensors observe 
noisy versions of the same signal, and must convey their observations to a centralized decoder at a 
combined rate of not more than R bits/sample. This case generalizes the problem of encoding correlated 
observations, to the case when the number of sensors is large, and to the case when the signal to be 
communicated cannot be observed directly. Berger et al. present a solution to this problem in the general 
case [6]. Viswanathan and Berger specialize the results of [6] to the Quadratic-Gaussian case [53]: an 
interesting conclusion in this case is that the optimal rate of decay of the error is of the form when 
the sensors cannot communicate prior to transmission, as opposed to an exponential decay otherwise. 

An interesting duality between the problem of rate/distortion with side information discussed above, 
and the problem of channel coding with side information at the transmitter [12], has been pointed out 
by several groups [3], [34], [46]. Cover and Chiang present a comprehensive coverage of duality issues 
in problems with side information [14], and Chiang and Boyd fully develop an optimization-theoretic 
approach to analyzing the duality of channel capacity and rate distortion problems [9]. Merhav and 
Shamai established a separation theorem in this context [30]. Therefore, it should be possible to derive 
good codes for one problem from good codes available for the other. 

Zamir et al. present a very interesting tutorial on noisy multiterminal networks, with many useful 
references [61]. 

3 ) Performance of Wireless Networks: A key result in the analysis of performance of wireless networks 
states that when n non-mobile nodes are optimally placed in a disk of unit area, traffic patterns are 
optimally assigned, and the range of each transmission is optimally chosen, the total throughput that the 
network can carry is 0{^/n) [22]. As a result, the per-node throughput is only 0(-^), i.e., decays to 
zero as the number of nodes in the network increases. Other results along the same lines were presented 
in [23], [57]. 

The work of [22] sparked significant interest in this problem. When nodes are allowed to move, assum- 
ing transmission delays proportional to the mixing time of the network, the total network throughput is 
0(n), and therefore the network can carry a non-vanishing rate per node [21]. Using a linear programming 
formulation, non-asymptotic versions of the results in [22] are given in [49]. Using pure network flow 
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methods, similar results (and generalizations thereof) have been obtained in [32], [33]. An alternative 
method for deriving transport capacity was presented in [27]. 

D. Main Contributions and Organization of the Paper 
This paper presents the following original contributions: 

• The construction of lattice codes for the problem of rate/distortion with side information. We propose 
a design procedure based on the choice of a lattice that is a good quantizer for the classical 
rate/distortion problem, and a geometrically-similar sublattice, inspired by the idea of partitioning 
codebooks to obtain good codes for this problem proposed in [42], [60], and by our previous work 
on the design of lattice quantizers for multiple description coding [51]. 

• An asymptotic analysis (in rate and correlation) of the performance of these codes which, to the best 
of our knowledge, is the first such analysis for Wyner-Ziv codes. Our analysis reveals some interesting 
shortcomings of these codes, and suggest a simple modification to make to the construction to ensure 
their optimality. These optimal codes effectively answer a challenge of Zamir and Shamai [60]. 

• The illustration that high correlation asymptotics in source coding are indeed a new asymptotic 
regime with very meaningful practical implications. So far source coding has considered two asymp- 
totic regimes: large block asymptotics [43], or high rate asymptotics [58]. High correlation asymp- 
totics are a new asymptotic regime that, as we will see in Section IV, proves quite relevant in the 
context of new problems derived from sensor networking applications. 

• The identification of a large class of applications for which the vanishing rates property of wireless 
networks does not pose a problem, by virtue of the fact that the amount of information that each 
node needs to transmit decays at the same rate as (or faster than) throughput does. 

The rest of this paper is organized as follows. In Section II we present the structure of lattice quantizers 
for the problem of rate/distortion with side information, and in Section EI we evaluate the performance 
of the codes obtained, under the assumption of high-correlation between the source X and the side 
information Y . In Section IV we illustrate how the proposed codes can be used to deal effectively with 
the vanishing rates property of an important class of large-scale sensor networks. Final remarks are 
presented in Section V. 
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II. Design of Lattice Codes with Side Information 

A. Definitions 

A source generates a sequence of zero-mean iid pairs {xi,yi)°^Q, with jointly Gaussian distribution 

1 1 

fxy{x,y) = -j^=^= e 

27r<Txo-yVl - 



with covariance matrix K 



2 

pax^Y ^Y 



, and correlation coefficient p. The corresponding condi- 
tional and marginal densities are denoted by /y|x> fx\Yy fx, fv- For a set of n linearly independent 
column vectors {vi, v„}, a lattice A C M" is defined by 

A = < ^QVj : ci...c„ G 

U=i 

and its generator matrix V = [vi|...|vn]. The volume of a polytope P C is denoted by For a 

constant s G R, the scaled lattice sA is the lattice generated by sV, where V is the generator matrix of 
a lattice A. The Voronoi cell of a lattice point A in the lattice A is defined by 

V[\:A] ={xGM" : ||x-A|p < ||x-A'|p, VA' G A}. 

The nearest neighbor map of a lattice is a function : I^" ^ ^> defined by 



(5a(x) = argminllx 

AeA 



|2 



where ties are broken arbitrarily (e.g., numbering all the A's, and assigning x to the A with smallest 
index). From the definitions it follows trivially that F[A:A] = {x G M" : (5a(x) = A}, except possibly 
for a set of measure zero. A lattice A' is a sublattice of a lattice A if A' C A. The quotient group [8, 
Sec. 6.3] of a lattice modulo a sublattice is denoted by A/A', and its order by |A/A'|. 
A Wyner-Ziv Lattice Vector Quantizer (WZ-LVQ) is a triplet Q = (A, k, s), where: 

• A is a lattice. 

• K : M" ^ M" is a linear operator such that ku • kv = c u • v (for some c > 0), and such that 
k(A) C a. Essentially, k defines a similar sublattice of A.' 

• s G (0, oo) is a scale factor that expands (or shrinks) A and k(A). 

Intuitively, the lattice A is the fine codebook, the one whose codewords are to be partitioned into 
equivalence classes. We choose to implement this partition by considering a sublattice A' C A, and then 

'Two lattices Ai, A2 (with generator matrices Mi, M2) are said to be similar when there is a constant c 7^ 0, an integer 
matrix U with |det([/)| = 1, and a real matrix B with BB^ — I, such that M2 — c UMiB [II]. Intuitively, similar lattices 
"look the same", up to a rotation, a reflection, and a change of scale. 
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considering the resulting quotient group A/ A', s is a constant that multipUes the generator matrices of 
the lattices considered, which is to be adjusted as a function of the correlation between the source X 
and the side information Y. A justification for the choice of a similar sublattice (as opposed to any 
other sublattice) to implement the codebook partition, and a justification for the explicit introduction of 
a scale factor s as a parameter of the quantizer (as opposed to having this lattice scale be determined 
by the coding rate, as in classical quantization theory) will become apparent later, after we study the 
rate-distortion performance of the proposed quantizers. 

The question of the existence of similar sublattices arose in connection with another vector quantization 
problem [51], and also in the study of symmetries of quasicrystals [2]. The subject is thoroughly covered 
in [10], where necessary (and in some cases sufficient) conditions are given for their existence. 

B. Encoding/Decoding Algorithms 

Let denote a block of n source samples, and Y"' a block of n side information samples. The 
encoder and decoder are maps /„ : M" — > sA/sk(A) and gn : sA/sk(A) x M" ^ sA, defined by 



= Qsa{X^ - Q..(A)(X")), X^ = 5n(/„(X"),y") = Q..(A)+/„(X")(1^"), (2) 



whose operation is illustrated in Fig. 2, with an example based on the lattice A2. 
C. Rate Computation 

There are only N = |A/k(A)| possible different quantizer outputs, each one with probability pk 
{k = 1...N) given by 



where 7^ G sA/sk(A), and where we identify the entire equivalence class with a canonical representative 
taken from A n y[0:K(A)]. The rate of a quantizer is then given by 



expressed in units of nats per source sample. 

Assume now, as is standard in fine -resolution quantization theory, that Voronoi cells of the quantizers 
under consideration are small. In this case, this translates into a requirement for sublattice cells to be 
small, for which we have that 




N 



fe=l 



i/(sk(A)) 



s"i/(k(A)) 
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Fig. 2. To illustrate the mechanics of the proposed quantizers (left: encoding, right: decoding). A sublattice similiar to the base 
lattice is chosen (circled points), matched to how far and Y" are expected to be: in this example, with high probability X" 
and y" are in neighboring Voronoi cells of the fine lattice. Then X" is quantized first with the coarse lattice, then this coarse 
description is subtracted from X", and this difference is quantized again with the fine lattice; this quantized difference is then 
sent to to the decoder, as a representative of the set of all codewords collapsed into the same equivalence class. At the decoder, 
the entire class is recreated (all the points with a thick arrow in the right picture), and among these, the point closest to the 
side information Y" is declared to be the original quantized value for X". Note that there is always a chance that a particular 
realization of the noise process may take Y" too far away from X", in which case a decoding error occurs. 



where the second equaUty follows from the fact that = |A/k(A)| = c~ , where c is the norm of the 
similarity defined by k [10] (and therefore the corresponding scaling is ^/c), U is unitary, and the last 
equality follows from assuming A is normalized to have determinant 1 [11]. Then, we see that requiring 
small sublattice cells translates into requiring that s"A^ be a small number. Now, under this assumption, 
the rate expression above admits a much simpler form: 



^ / /x(x)dx= E / /x(x)dx. 



Pk 
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The integral of the source density in pk can be approximated by 

fx{ti{X)+lk) ■ K^[K(A)+7fc :sA]). 

But assuming small cells for the sublattice (standard in quantization theory), since the Gaussian source 
is continuous, we have that within a cell of k(A) fx is approximately constant, and hence independent 
of the particular shift 7^. Furthermore, since A is a lattice, all its cells are congruent, and therefore 
their volumes are all the same, thus making v also independent of the particular shift 7/;. Call p this 
(approximately) constant value for p^. Therefore, we have 



1 



and hence. 



Pk 



7feeA/K(A) 



and 



|A//^(A)|p, 



i?«ilog2|AMA)|, 



|A/^(A)| 

independent of s and fx, where the approximations are tight in the limit as s^N — > 0. 

Note that, unlike in classical quantization theory, here the rate of a quantizer seems to be independent 
of the size of its Voronoi cells. In our context, a high-rate assumption translates into a large value for 
|A/k(A)|, i.e., cells in the fine lattice are small relative to the size of cells in the coarse lattice. But the 
parameter s, which determines the absolute the size of these cells, is not part of the rate expression. 



D. Distortion Computation 

Let 7fc(x) denote the encoding of a source sequence x (/c = l...A^), and 7(x, y) denote the recon- 
struction codeword for a source sequence x with side information y. Then: 



d 



(a) 



(c) 



x6M" 



A 1 

n 



|x - 7(x,y)|p/xy(x,y)dxdy 



xGM" JyGM" 



U \[ ||x-7(x,y)||Vy|x(y|x)dy 



/x(x)dx 
X- A||Vy|x(y|x)dy 



||x- A||2pr(y e V[X : 5k(A) + 7fc(x)] |x) 

AesK(A)+7fc(x) 

a(x, sk{A) +7fc(x))/x(x)dx, 



/x(x)dx 
/x(x)dx 



(3) 



where: 
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(a) is just the definition of average distortion; 

(b) follows from, for each possible source sequence x, partitioning the set of all side information vectors 
y into Voronoi cells of the sublattice sk{A), centered at location 7fc(x); 

(c) follows from the fact that | |x — A| p can be taken out of the integral, and what remains is an integral 
of the conditional density function. 

The last definition is introduced to highlight the concept that in quantization with side information, 
an entire sublattice plays the role of a single codeword in classical quantization - the average error in 
reconstructing x is seen to take the form of an expectation of a suitably defined distortion metric between 
source sequences and sublattices. In Section III we study the asymptotic behavior of (3), assuming high 
coiTclation between X"- and Y"-. 

E. On the Choice of Similar Sublattices 

As we will see in Section III, there are some drawbacks to implementing quantizers for the Wyner-Ziv 
problem with a fine quantizer that is essentially a truncated lattice, as follows from the construction given 
here. But there are also significant benefits to doing so, in terms of the simplicity of this implementation. 
So for the time being, if we are going to use two lattices, it is of interest to consider what kind of lattices 
should be used. 

Suppose we fix the scale factor s, and the code rate ^ In(A^). Among all the sublattices of A of index 
N, are there differences in terms of their distortion performance? Which sublattices should we choose? 
It follows from (3) that a sensible design criteria is to choose the sublattice which results in maximizing 
Pr{y G V[{):sk{K)] | X = x}, for x G V[Q:sN\. 

Since the vectors X and Y are jointly Gaussian and with iid components, the vector Y\X = -x. is also 
Gaussian and with iid components (although the Xj's and the yj's are certainly not independent of each 
other). The pdf of y|X = x is therefore circularly symmetric, and it follows from classical arguments 
of coding for Gaussian channels that, to maximize Pr(y G V), we need to maximize the norm of the 
shortest vectors in k(A). This situation is illustrated in Fig. 3, with an example based on the lattice A2. 

The choice of A2 for illustration purposes in Fig. 3 is not arbitrary. In that particular case, it is known 
that the minimal norm j^t of any sublattice of index N in A2 satisfies ^ < N, and that fi = N if and only 
if the sublattice is ideal [7]. Furthermore, in two dimensions, A2 is both the best classical quantizer and 
the best channel coder [11]. Therefore, it seems clear that a hexagonal lattice and a similar sublattice 
are the best design choices in two dimensions: this combination simultaneously minimizes quantization 
error, and minimizes the probability of a source vector being decoded to an incorrect codeword. 
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Index N=21 




-1.5 -1 -0.5 0.5 1 1.5 -1.5 -1 -0.5 0.5 1 1.5 



Ideal: 5+w Non-ideal: [5+2w, 2+5w] 

Min. squared norm: 21 Min. squared norm: 19 

Fig. 3. Two different sublattices of A2, of index A*' = 21. A2 is isomorpliic to the ring of Eisenstein integers = {a + buj : 
a,b € Z; ui — [— ^, -^] = e^^''''^}, and ideal sublattices refer to ideals of this ring. Observe that the ideal sublattice of the 
example has shortest vectors of norm 21, whereas in the non-ideal sublattice the shortest vectors are shorter. 



Another interesting example is that of very high dimensional spaces. In this case, we know that good 
quantizers have (nearly) spherical Voronoi cells. But at the same time, spherical cells maximize the 
minimum distance between sublattice points, and therefore an optimal sublattice will have to be similar 
to the base lattice. 

In between dimensions 2 and 00, we are not able to make equally strong statements — but we use 
the insights derived from these extreme cases (a lattice with small second-order moment and a similar 
sublattice) as guiding principles, to curb the complexity of the design task. 

III. ASYMPTOTICS OF QUANTIZERS WITH SiDE INFORMATION 

A. Modeling Assumptions and Performance Metric 

1 ) Modeling Assumptions: Our goal in this section is to find a simpler expression for d than that 
presented in Section II-D. To do so, we work under some extra assumptions: 

• The correlation coefficient p between X and Y is close to 1. 

• The coding rate R is large. 

• The scale factor s is small. 

The effect of these assumptions is illustrated in Fig. 4. 
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/Aiy(x y) 




/x(x) 



I I I I I I I I I I I I I I I I I / l I I I 



I I > l I I I I I I I I I I I I I I I 



Fig. 4. Illustration (in one dimension) of the meaning of the asymptotic regime considered in this work. Working under an 
assumption of high correlations, we have that the conditional distribution of the source x given side information y is sharply 
concentrated around its mean value y - as a result, we can make the probability of the source x away from y by more than any 
positive constant be arbitrarily small (by choosing p close enough to 1), and hence we can assume that sublattice cells, while 
being vanishingly small themselves (s ~ 0), can be considered large enough to contain most of the probability in fx\Y- Then, 
because we take R large, we further partition each sublattice cell into a large number of much smaller fine lattice cells. 

The basic intuition on which our analysis in this section is built is very simple: by considering high 
enough correlations, the encoder can "roughly center" the conditional distribution /x|y at the centroid 
of a sublattice cell, a cell that is large enough to make the probability that the source vector x is not 
in the considered cell negligible, but at the same time small enough so that tools employed in classical 
quantization problems can be applied. 

Recall that as mentioned earlier, unlike in classical high rate asymptotics where i? — > oo results in 
i/(A) 0, in this case we must explicitly force s — > 0, but not "too fast" - in this case, too fast would 
be at a rate equal or faster than the rate at which fx\Y shrinks, as \p\ 1. We will do so by setting the 
scale factor s to be s = s(p), where s : (—1, 1) is such that 



2 ) Performance Metric: Some justification seems necessary at this point for considering high-correlation 
asymptotics (i.e., \p\ 1), since under this assumption, the side information available uncoded at the 
decoder already contains almost all of the information about the source. And indeed, once we are done 



lim sip) = 0, 




(4) 



For example, s = ax 



\J\ — p^ log ( 1 / (Jx ~ P^] satisfies these conditions. 
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with our calculations, we will confirm the (hardly surprising) fact that for any fixed target distortion 
D, using these proposed quantizers and as \p\ — > 1, the rate required to achieve D vanishes. This is a 
condition that must be satisfied by any decent quantizer. However, that is not why we are interested in 
this analysis: instead, our goal is to evaluate 

d 



lim 



(5) 



\p\^iD[Ry 

where d is the distortion of our quantizers, and D{R) is the Wyner-Ziv rate/distortion function-that is, we 
wish to compare the slope of the distortion function for our proposed quantizers at asymptotically high 
correlations, with that of the Wyner-Ziv bound. This is a meaningful performance metric, as it determines 
the rate of decay of distortion relative to the fastest possible decay.^ 



B. Asymptotics of the Average Error With Geometrically Similar Coarse and Fine Lattices 

1) A Simpler Expression: To obtain a simpler expression for d than that of eq. (3), we start by 
expanding it in a different way: 



d 



(a) 



(J 
(J 



|x - 7(x,y)||Vxy(x,y)dxdy 



xGK" JyeM" 



U \[ ||x-7(x,y)||Vx|y(x|y)dx 



/y(y)dy 



- y 



/ l|x-7(x,y)||Vx|y(x|y)dx 
yxeR" 



/y(y)dy 



lY.\f l|x-7(x,A)||2/;,|y(x|A)dx 



AesA 



fY{X)u{sk) 



[ ||x-7(x,0)||2/x|y(x|0)dxl I fvW^i^^) 



xey[0:sK(A)] 



|x-7fc(x)||Vx|y(x|0)dx 



+ ^ E / ||x-(A + 7fc(x))||2/^|y(x|0)dx 

Ae.K{A)\{0} •^xeV'[A:s«(A)] 



(6) 



(V) 



where: 

(a) is again just the definition of average distortion; 



This type of analysis is similar in spirit to (and inspired by) that of Verdu for modulation schemes operating at asymptotically 
low SNRs [52]. 
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(6) follows from partitioning the set of all side information sequences y into Voronoi cells of the fine 
lattice sA; 

(c) follows from the assumption that z^(sA) is small, and from the continuity of f^^^„ \ |x— 7(x, y)| P/x|y ( 
as a function of y; 

(d) follows from the symmetry of fx\Y a function y; 

(e) follows from the fact that fy integrates to 1, and from splitting the domain of integration of x into 
Voronoi cells of the sublattice sk{A). 

Our next goal is to find simpler expressions for a and f3. 

To simplify a, we observe that this term denotes the MSE incurred into when quantizing samples of a 
distribution with an A^-level fixed-rate uniform quantizer, if we assume that the overload cells 

of the quantizer occur with negligible probabiUty - and this assumption is justified because, for \p\ « 1, 
sublattice cells are large relative to the spread of fx\Y due to our choice of s in (4). Now, again under 
the assumption that R is large, the random shift in the mean of fx\Y given by its dependence on the 
unknown parameter ^ is negligible compared to the size of a sublattice cell. Thus, by choosing a value 
of \p\ close enough to 1, the probability of x y[0 : sk(A)] can be made arbitrarily small. This is 
illustrated in Fig. 5. 




Fig. 5. Illustration (in one dimension) of the concept that, irrespective of a small random shift in the mean introduced by the 
unknown side information, a fine quantization of the sublattice cell (thin lines in between thick lines) results in a fine quantization 
of the unknown distribution. The true distribution could be any of those illustrated for various unknown vectors ^k- 

The requirement that the fine and coarse quantizers be geometrically similar lattices results in cells of 
the coarse lattice being partitioned uniformly by the fine lattice; this is the optimal quantizer for a source 
that is uniformly distributed over a sublattice cell, not distributed according to fx\Y- Therefore, defining 
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a new pdf p(x) = if x is in the corresponding sublattice cell, and zero otherwise, we have that 

lim N^a = G{A)s'^; 

this follows from evaluating eqn. (81) in [11, Ch. 2] for the uniform distribution p defined above, 
specialized to the lattice A. Therefore, for N large, we can (equivalently) say that 

a « G(A)s2e-2^. 

Since /? > 0, we have that d > a, and so 

d > G{A)s^e-^^. (8) 

2) Comparison Against Wyner's Rate/Distortion Bound: Our next step is to evaluate the figure of 
merit defined by (5). To this end, consider Wyner's rate/distortion bound [55]:^ 

D{R)=aj,{l-p')e-^''. (9) 

Plugging eqns. (8) and (9) into (5), we get 

d ^ G(A)s2e-2R 
lim „ , > lim :r- — ^ 

\p\^iD{R) - |pHi4(l-p2)e-2/j 

= G(A) lim -^j— — 

'|pHl4(l-p2) 

= oo; 

the divergence of this limit follows from choice of lattice scaling specified in eqn. (4). Therefore, 
when the fine quantizer is constrained to be a lattice that is geometrically similar to the coarse lattice, 
the performance of the resulting Wyner-Ziv quantizer is very poor in the asymptotic regime of high 
correlations. This observation motivates us to introduce a small modification in our code construction. 

C. Asymptotics of the Average Error with a Coarse Lattice and an Optimal Fixed-Rate Fine Quantizer 

1) A Simpler Expression: The suboptimality of the code construction based on two geometrically 
similar lattices stems from the fact that sublattice cells are partitioned uniformly, but the source distribution 
fx\Y being quantized is not uniform. Therefore, we enlarge the class of codes considered: 

• we keep the requirement that the coarse quantizer be a lattice; 

^In Wyner's paper, the bound is given in thie form R{d) = i log ^ f^Jl^Ji (for thie low distortion region), where (7% is 
the variance of X, and Y — X + U, where U has variance cr^. A straightforward manipulation puts Wyner's expression in the 
form shown here. 
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• we keep the same quantization algorithm of eqn. (2); 

• but we now allow for the fine quantizer to be any arbitrary fixed-rate classical vector quantizer. 
By removing the restriction that the fine quantizer also be a lattice, we can now choose one still with 
N reconstruction points, but whose output point density, instead of being uniform, is matched to the 
distribution /js(^|y (x|0). As a result, we conclude that there exists a quantizer such that 

lim N^a = G„||/x|y||^, 



= I r f— fx)dxi~ 

1 + 2 

tion), and is bounded in terms of the standard F function by 



n Ti + 2 

where = [/ f~{x)dxj " , and where Gn depends only on n (but not on the source distribu- 



^ rf^ + i)" < Gn < — rf^ + i)" r(i + -), (lO) 



(n + 2)7r V2 / " rnr \2 ) V n 

as follows from eqns. (81) and (82) of [1 1, Ch. 2]. Hence, for |p| ~ 1 and for N large, we can approximate 

a by 

a ^ G„||/x|y||-^e-2«. 
To simplify /?, the following estimate is obtained in Appendix A: 

Combining these two estimates, we arrive at a final expression for d: 



"[27raj.(l-p2)]2 \^^_g- 2.^a-,:- ' 
2) Comparison Against Wyner's Rate/Distortion Bound: Plugging eqns. (9) and (12) into (5), we now 
get 



2<T^,(l-p'!) 



r llf,.„.ll « p-2'R-L 1 2i/(sK(A))e^5" 

d GMxivW^.e +;ip^,2(,_,2)]^ Vi-e-^^I^ 

lim — : — - = lim 



Gn lim + lim 



From eqn. (57) in [58], we have that lim„^oo ||/n||^ = e'^^^^\ where = (/)" is the n-dimensional 
source distribution, and h denotes differential entropy. We don't know of a way to simplify this expression 
for small n, so we approximate it with its limit value as n gets large.'* For the conditional Gaussian 

''it is important to emphasize that although we consider large blocks to simplify this does not mean that the 

distortion expression thus obtained is only valid for high dimensional quantizers: we can consider long source blocks, in which 
small sub-blocks are quantized with low dimensional codes (for example, scalar quantizers), and this form would still apply. 
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distribution, h{f) = ^ log (27re(T^(l — p"^)), and hence 

limn^oo||/x|y||-^ 
G„ hm 2— „, = G„ 27re. 

Note as well that the second term vanishes: for \p\ — > 1, from (4) we have that s^/ — p"^)) — > oo, 
and thus this expression is dominated by the vanishing term e 2^5"o~^ Hence, we conclude that, by 
explicitly scaling the quantizers with s satisfying conditions (4), 

d 

liin -^x = Gn 27re. 
Ipl-i D{R) 

Finally, since for n large the upper and lower bounds on G„ given in eqn. (10) coincide and take the 
value [11, pg. 58], we see that indeed, as n oo, there exist high-dimensional codes for which 
this limit can be made arbitrarily close to 1. Hence, asymptotically in rate and correlation, our code 
constructions achieve the Wyner-Ziv bound. 



D. Some Intuitive Remarks 

1) On the Optimality of our Codes, in Hindsight: Informally, these are the key elements contributing 
to the optimality of our codes: 

• The codes are scaled in a way such that, as correlation increases, the tails of the conditional 
distribution fx\Y outside a cell of the coarse quantizer become increasingly light. 

• At high correlations, our scaling of the codes results in the size of cells in the coarse quantizer 
being small. But at high rates, the size of a cell in the fine quantizer is negligible even relative to 
the small coarse cells. And the side information is, with high probability, "pinned" within one of 
the small fine quantizer cells. 

• Because the tails of /x|y are increasingly light as correlation increases, and fx\Y is not uniform, 
an optimal quantizer for a uniform distribution is mismatched to the actual statistics of the data, 
thus resulting in a severe penalty in rate. However, this penalty can be eliminated entirely in a very 
simple way: only changing the shape of the cells for the tine quantizer is enough - if the output 
point density of the fine quantizer is matched to the pinned form of fx\Y^ this is an optimal code. 

Essentially, our construction is asymptotically optimal (in rate and correlation), because we scale the 
lattice in a way such that we create multiple copies of /x|y one within each cell of the coarse lattice, 
and we use an optimal code within that cell. 
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2) On Why R*{d) = Rx\Y{d) for Gaussian Sources: This asymptotic analysis also sheds light on why 
there is no rate loss for Wyner-Ziv coding of Gaussian sources, at least in the asymptotic regime of high 
rates and high correlations. Note that the conditional distribution fx\Y depends on the side information 
y only in the form of a random shift: this random shift becomes negligible at high rates, but more 
importantly, the shape of fx\Y is independent of y. As a result, a single code can be used to quantize the 
fx\Y'^ pinned one within each cell of the coarse lattice. It is this invariance property of the conditional 
Gaussian distribution that results having R*{d) = Rx\Y{d), at least in the asymptotic regime considered 
in this section. 

IV. Applications in Sensor Networks 

A. Discussion 

Issues in the analysis of performance of wireless networks have received considerable attention in 
recent times. To a large extent, interest on these topics has been sparked by an observation made by 
Gupta and Kumar: the total throughput that can be carried by one particular class of wireless networks 
is only 0{y/n),^ for a network having 0{n) nodes [22]. As a result, each source-destination pair gets 
a throughput of 0{l/^/n), i.e., the amount of information that any one individual node can inject into 
the network vanishes as the network size increases. The model used for performance analysis in [22] 
was conceived as an abstraction for emerging ad-hoc wireless networks, made up of small appliances 
(such as laptop computers or microwave ovens or door locks), interconnected via standard air interfaces 
(such as Bluetooth or 802.11). In that context, the fact that as more nodes join the network then the 
capacity available to each node decreases, clearly poses serious problems, since there is no reason to 
believe that there will be any dependencies in the data generated by each of these devices. And these 
problems prompted the conclusion in [22] that networks with either a small number of nodes, or with a 
small number of connections, may be more likely to find acceptance. 

In our work, we consider a different type of wireless networks: we focus on sensor networks, i.e., 
networks of devices that collect measurements of a process that is "regular" in some sense. For example, 
if the sensors measure ozone concentration in the atmosphere, then the values of each measurement will 
not be independent in general, but instead will be constrained by an appropriate form of the Navier-Stokes 

word on notation. In this section, n denotes number of nodes in tiie network, and A*' denotes block length. This notation 
should not be confused with that in previous section, where n was used to refer to block length, and to the number of 
reconstruction codewords in a code. 
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equations. If the sensors measure temperatures at different locations of a material, the measurements will 
be constrained by Fourier's heat equations. And in general, when the sensors sample values of some 
random process at different locations, these samples will be constrained by the correlation structure of 
the process (see, e.g., [41]). By considering correlated sources we generalize in what we believe is a very 
meaningful way the setup of [22]: now the amount of information generated by each node is no longer 
a constant, but instead it depends on the size of the network itself. 

B. Network Model 

Consider the following problem setup: 

• There is a source of information, modeled by a process Xu{k): for fixed values of k, Xu{k) is a 
brownian motion with parameter cr^; for fixed values of u G [0, 1], Xu{k) is an iid sequence. That 
is, at a fixed location u, iid samples with distribution A^(0, a'^u) are collected in discrete time, and 
at a fixed time slot, a Wiener process unfolds in space. 

• Network nodes are represented by points on the unit square [0, 1] x [0, 1] C M^, and are classified 
into three groups: 

- There are n source nodes s, that feed information into the network, uniformly spread on the 
left edge of the square. 

- There are n destination nodes d, that take information out of the network, uniformly spread on 
the right edge of the square. 

- There are n router nodes r, optimally placed in the interior of the square, to maximize network 
throughput. These nodes are pure routers, they neither inject nor extract information to/from 
the network, and they don't apply any form of coding, they only forward information to other 
nodes. 

• The m-th source collects samples of X„/„(/c), and encodes this information prior to sending it to 
the m-th destination (m = l...n). The only information available to each source is: 

- The observed samples X^jj^{k). 

- The position in the square of all the nodes. 

- The statistics of the entire process X. 

• Each destination node forwards whatever data it receives to a special node d, which jointly decodes 
all the data received, and computes an estimate Xu{k) of the entire sample path Xu{k) based on 
all the decoded samples X^/„(A;)'s. 
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• Nodes do not move, and have an unbounded power supply. 

• A bit is successfully sent from node Vi to node Vj if (a) \\vi — Vj\\ < Aj, and (b) if for all other 
transmitting nodes v^, \\vk — Vj\\ > Aj.. R bits per channel use can be transmitted over any link. 

• Routing and power control are optimally configured to maximize network throughput. 

Note that in this model we explicitly rule out the possibility of source nodes exchanging information 
to cooperate in the encoding of their observations. Note also that routers only forward data, but do not 
apply any form of coding. That is, encoding is distributed among the sensors, data is carried over the 
network by relay nodes, and decoding is performed at a central location. 

We should point out that our model is different from the model of Gupta and Kumar [22] : whereas in 
their model they consider n nodes which serve as transmitters/receivers/relays all in a single device, we 
break up each device into three pieces, and consider n transmitters, n receivers, and n relays. However, 
this is not a fundamental difference: as long as we keep the same number of all three types of devices, the 
two models are essentially the same, and therefore their results on the property of vanishing throughputs 
as n — > oo still holds for our model. The idea of splitting the devices into three separate units is to model 
a situation in which data is captured at some location, is transported over an ad-hoc network, and an 
estimate of the field of measurements is formed at a remote location. 

C. Encoding/Decoding Mechanics in Large Networks 

Clearly, a network with a finite number of nodes and with communication links of finite capacity 
among nodes, can transport only a finite amount of information. Therefore, exact reconstruction of the 
brownian field Xu{k) will not be possible in general, and a key issue then is that of understanding the 
rate/distortion tradeoffs involved. A thorough study of this new rate/distortion problem lies outside the 
scope intended for this paper, and we will deal with this problem elsewhere. Of interest in this paper 
however is a result that relates the ability of the central destination node d to estimate the brownian field 
Xu{k) to both the number of nodes in the network and the capacity of the individual network links. 
Indeed, we have that under the assumption of a large (but still independent of network size) link capacity 
R, for any e > and 1 — e < p < 1, there exists a large enough network of size n nodes, such that 




uniformly for ^ in the closed interval 
for almost all sample paths of the field 



1 , where m < n is an integer, for all time slots k, and 
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Essentially, what this result states is that, under the assumption of a large network and with links of 
high capacity, it is possible for d to estimate the sample paths of X with arbitrarily small error. That 
accurate estimation is possible is indeed surprising to us, given the fact that the amount of information 
per sample that the network can carry vanishes [22] — fortunately, so does the information content per 
sample, and that is what we can take advantage of. 

1 ) Placement of Nodes and Scheduling of Transmissions: First of all, we give one particular distribution 
of routers in the plane and one particular algorithm for scheduling transmissions. 

Assume £ = y/n is an even integer, and define: 

• The sources are located at coordinates (0, y^), and the destinations at coordinates (1, for i = l...n. 

• There are exactly n routers, located at coordinates (^^ + |) ^ + |)> for z, j = 0, 1, — 1. 

• The transmission radius for the source nodes is A = and for the routers it is A = j.^ 

In order to present an algorithm to schedule transmissions over time, we need some definitions. First, 
divide the square [0, 1] X [0, 1] C into (. sets defined by 

g{i) 



^ > * X [0, 1] 



n n 



(i = !...£). Within each 5^*), there are: 

• ^ source nodes, at coordinates ( 0, ('~^)^+'" 1 for m = {)..! — 1. 



destination nodes, at coordinates ( 1, i)i.+m ] foj- = o...£ — 1. 



• I router nodes, at coordinates + ^ + |), for A; = 1...^. 

Next, we divide the router nodes into three groups go,gi,g2- a router falls in gj if its index k is equal to j 
(mod 3). Source nodes all belong to the group go. Finally, we give an algorithm to schedule transmissions: 

• Time is discrete, and starts at 0. At even time slots, allow transmissions of nodes in 5*^*) 's for which 
i is even; at odd time slots, allow transmissions of nodes for odd i's. 

• Each S'^*) keeps its own clock r^, which advances only when transmissions from this S^^'^ are allowed 
to proceed: when = (mod 3) then go sends, when = 1 (mod 3) then gi sends, when Tj = 2 
(mod 3) then g2 sends. And source nodes send only once every £ available slots, cycling through 
them in round-robin order. 

An illustration of the placement and divisions of nodes, and of the mechanics of the algorithm, is 
shown in Fig. 6. 

^Recall that destination nodes do not communicate over the shared wireless medium with the central decoder, they only 
receive data that way. Therefore, no transmission range needs be specified in their case. 
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Fig. 6. An example of the placement and division of nodes, and scheduling of transmissions, for n = 16 (£ = 4). Black dots 
represent nodes: 16 sources on the left edge of the square, 16 routers inside the square, 16 destinations on the right edge of the 
square. A source sends data to a destination on the same horizontal line. Thin solid lines joining nodes are routes. The sets S*''-* 
and the groups gi are indicated with dotted lines. Active transmissions are indicated with a thick arrow, and the circles around 
each indicate transmission ranges. The active transmissions in this picture correspond to an odd time slot (nodes only within 
5'^' and S*"^^ are sending), and the group go is active. 



2) Throughput per-Node is g^-' The calculation of throughput proceeds in three steps: 

1) Each group 5^*^ is scheduled for transmission only ^ of the available time slots. Among these 
slots, only i are available for transmission by qq, the group that contains source nodes. When this 
group is scheduled, only once every I slots is available to a particular node. And when a particular 
node finally gets his chance to inject a message into the network, it injects R bits (equal to link 
capacity). Therefore, the total number of bits injected by any one source node per unit of time is 
11 1 p — _R_ 

2) By construction, there is never more than one packet of R bits in the buffer of any router. 

3) Also by construction, there is never more than one active transmission within range of any receiver. 
So, from 1 we have that bits per time slot are injected into the network, from 2 we have that there 
is no buildup of packets in any one queue, and from 3 we have that packets are never lost or delayed. 
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Therefore, all injected bits reach destination, and hence the throughput is bits per time slot per node. 

3) Use of Codes with Side Information: So far we have a network in which there is no loss of data, 
and which can carry a total of bits per time slot per node. And we collect one sample of the brownian 
field X per time slot at each source node. Therefore, we have bits per sample to encode a block of 
N samples, for which the network guarantees delivery. 

Consider encoding a block of samples X^^^ = [X„/„(0)...X^/„(A^ — 1)] at the m-th source node. 
Trivially, we have that X^^^ = + (^^/n ~ ^(L-i)/n)- ^^^^ standard properties of Wiener 

processes, we have that X^^^^ and are jointly Gaussian, and that the increment has distribution 

independent of If -^(Jn-i)/n were available at the m-th encoder, the encoding procedure would 

be trivial: use standard codes for an iid Gaussian source to send this increment. But without the reference 
value Xj^^-^y^, m cannot compute that increment, which is the only "new" information at location ^. 

Our encoding procedure is as follows: we encode using the codes developed in earlier sections, 

assuming the side information X^ , w is available at the decoder. The relevant statistics are: 

(m—l)/n 

^IL-D/n ~ N (0,cTi(m-l)/nl) , ~ N (0,4m/nl) , = y/T^. 

D. Distortion Computation 

Next we turn to the computation of distortion for this proposed coding strategy. Note that since the side 
information used to decode the data generated by one node is the data available at previous nodes, and 
that decoding errors can indeed occur with non zero probability (and thus, in the large-network regime, 
will occur), an important issue that needs to be addressed is the effect of decoding errors on the overall 
achieved distortion. 

We proceed in two steps: first we compute the distortion resulting in the case when no decoding errors 
occur, and then we compute the increase in distortion due to decoding errors. 

1} Distortion Assuming No Decoding Errors: Consider a fixed location ^ (1 < m < n), a fixed 
desired correlation value p based on which a large enough value of n is determined, and assume that no 
decoding errors occur in decoding samples 

In Section rV-C.3 above, we argued that we can use codes with side information to effectively 
approximate the performance of a genie-aided encoder capable of sending the increments at each node. 
We would like to point out now that in our decoder, the side information is itself quantized with the 
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coarse lattice. As a result, as long as X m-i and X ^^-i fall in the same sublattice cell, the reconstruction 
Xhl is as good as if it were based on uncoded side information. This is illustrated in Fig. 7. 




Fig. 7. To illustrate the robustness of the proposed quantizers to small amounts of quantization noise in the side information: 
as long as the side information falls within a sublattice cell (roughly indicated as the shaded region in this picture), using coded 
or uncoded side information does not make a difference. In this case, X^^i is the sample at the previous location, used as side 
information for the sample Xm at the current location. 

Thus we conclude that, provided no decoding errors occur in any of the previous samples, and based 
on the results in Section III, we can approximate the distortion in the reproduction of each sample by 
Wyner's rate/distortion bound: 

D:^ < aif(l-p2)e-^, 
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Note that the inequaUty in this case is because there will be nodes operating with a correlation value 
higher than the specified p, and for these values Du will be even lower than this. The location-dependent 
correlation coefficients Pm-i,m between adjacent samples forms a monotonically increasing sequence 



y^l — 1/m — > 1 as m ^ oo. A trivial manipulation shows that for all m > p < Pm-i,m < 1> 



and therefore all node locations — in the closed interval 



will have correlation values at least 



n(l-p2) ) 

p. Now, since m < n, by choosing n large enough we can make ^{i-p^) ^^^^ arbitrarily close to zero. 
So we see that the distortion bound above holds uniformly for almost all samples in a large network. 

At locations u in which there is no sample collected (i.e., any location in an open interval (^^^^, ■^)), 
we need to interpolate X^. we define Xu = ^{m-i)/n' where {m — l)/n < u < m/nP In this case, 

Du < + ^, 

since the interpolation error is at most the size of an increment between samples, and this increment has 
variance cr^/n. Assume now that the sample path Xu{k) is continuous at u: 

• Because n is large, and for a fixed /c G N, we have a dense sampling of Xu{k), < u < 1. 

• Because R is large, encoded samples Xu available at the decoder are close to the original value X^, 
i.e., Xu ^Xu,u = ^. 

• Because Xu is continuous and n is large, we have that interpolated samples Xu ~ ^(m-i)/n 
(^ <u<^), for all < n < 1. 

2 

Therefore, < + ^ holds at all points of continuity of Xu- But finally, since almost all paths 

of a Wiener process are continuous [45], we conclude that 

Du < ^i(^(l-p')e-^ + i) (a.e.), 

where (m — l)/n < u < m/n, and I < m < n. 

2) Distortion Excess Due to Decoding Errors: In the subsection above we obtained an expression 
for the distortion in the reconstruction of the sample paths assuming that decoding errors never occur. 
This is clearly a lower bound on the achievable distortion. But we still need to account for the distortion 
increase that results from the increasingly likely (as n oo) event of a decoding error. Our next goal 
is to show that, in large networks, this excess distortion is negligible compared to the distortion above 
induced by the quantizers. 

Consider two definitions: 

'Note that we could use better interpolators here than a simple zero-order hold. But already with this rather simple minded 
rule we get the sought result of vanishing estimation error, and hence we keep it for simplicity. 
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• Tm is a random variable such that T.^ = / denotes the event in which / nodes (out of the m right 
before the node at location — ) make a decoding error. Since conditioned on the side information 
being correct, errors are independent at each node, ~ B(m,p„): a binomial distribution with 
parameters m = number of previous nodes, and pn = probability of decoding error given that there 
are n nodes in the network. 

• We refer to the term /3 defined by eqn. (7) as the excess distortion at node m. 
Both these definitions are illustrated in Fig. 8. 




Decoding error Decoding error 1 



Fig. 8. To illustrate the concept of excess distortion. In this picture we show the reconstruction that would result when no 
decoding errors occur (bottom sample path), and the effects of decoding errors (jumps of average size ^/P, as defined in eqn. (7), 
after each decoding error). Note that these errors do not necessarily add up coherently from node to node, as illustrated in this 
picture - however, taking them to behave in this way provides a valid upper bound on the total excess distortion they induce. 
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Consider now the distortion in a reconstruction of Xm, based on coded side information: 

1=0 

= an + Pn {mpn{l - Pn) + m^Pn) = "n + /3n"T'Pn(l + (m - 
(c) 

< a-a + I3„npn{l + 

W -tV 2 2 

an + e '"xn p^ 

where: 

(a) follows from eqn. (7), and from the fact that if I errors occured before the decoding of the m-th 
sample, on average each error contributes distortion I3n and in the worst of cases all these errors add 
up coherently (the dependence of a and (3 in eqn. (7) on n is highUghted by adding the subscript); 

(b) follows from the binomial distribution of T^; 

(c) follows from the fact that the expression above must hold for all 1 < m < n; 

(d) follows from the fact that for n large, we can neglect the polynomial terms associated with the 



negative exponential, and from the fact that p = ^1 — ^. 
Clearly, as n — > oo, both a„ — > and (3'n — > 0. But again, this is not an interesting observation. The 
interesting observation in this case is that still in the presence of coded side information and decoding 
errors, in the regime of high correlations, is negligible compared to Q!„, and — a„: 

for any e > and n large enough. But we also have > 1 (since > 0). Thus, the excess 

distortion due to the use of coded side information and possible decoding errors is negligible compared 
to the distortion induced by the quantizers themselves. 

To conclude this section, we would like to point out that there is an interesting tradeoff in this analysis, 
that works out favorably for us. Note that by increasing the number of nodes, we increase the number 
of places at which errors can occur, and therefore the probability that some node will make a decoding 
error is increased. However, as the number of nodes increases, the correlation between their measurements 
increases as well, and therefore the size of errors is reduced. And as the previous analysis shows, a linear 
increase in the number of nodes results in an exponential decrease in the size of each enor - hence, error 
propagation is not a problem in this setup. 
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V. Conclusions 

In this paper we presented our work on the design and performance analysis of codes for the problem 
of rate distortion with side information, and on the application of those codes in the context of a problem 
of data compression for sensor networks. First, we gave concrete constructions for the nested codes 
studied by Shamai/Verdu/Zamir in [42], [60], effectively answering an open question raised in [60]. Then 
we studied the distortion performance of our codes, under the assumption of high correlation between 
the source and the side information and of high coding rates: there we showed that our codes attain the 
theoretically optimal distortion decay established by Wyner and Ziv [55], [56]. Finally we computed an 
upper bound on the error made in estimating a brownian field based on measurements collected by very 
"cheap" devices and delivered over a wireless network. In this case, even though the per-node throughput 
of the network vanishes as its size increases, and even if the nodes are not allowed to exchange any 
information at all, we showed how arbitrarily accurate estimation of the remote field is possible. To 
conclude the paper, we would like to comment on some issues that follow from our work. 

Concerning the problem of source estimation, in the presence of constraints on the available data 
imposed by the wireless network: 

• The Brownian model for the source considered in this work is probably one of the worst cases we 
could have considered, in the sense that the regularity conditions satisfied by this process are minimal. 
For example, almost all of its sample paths are indeed continuous at almost all points (something 
we did use in our analysis); but at the same time, almost all sample paths are not differentiable at 
almost all points. Furthermore, the crucial assumption of high-resolution quantization that enabled 
us to apply our codes in the presence of coded side information cannot be justified for processes 
with increments of variance 0(n^^+'^), for any e > — compare this to the 0(n^^) variance of the 
increments of the model we considered. 

• Interesting questions arise if we consider processes more regular than Brownian motion: consider 
for example the case when is a bandlimited signal (since is compactly supported, take its 
periodic extension). If the samples X^j^ were available at the decoder without distortion, it follows 
from Shannon's sampling theorem that a network of finite size is enough to achieve a reconstruction 
with zero distortion. However, this would require network links of infinite capacity. For any finite 
value of R, there are tradeoffs to explore between the number of nodes in the network (i.e., the 
sampling rate) and the capacity of the network links (i.e., the accuracy in the representation of each 
sample), since economic constraints may favor one or the other option. This problem has received 
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considerable attention in the signal processing and harmonic analysis literature [16], [17], [19], [26], 
[47]. 

Concerning coding/quantization. Whereas our asymptotic analysis was performed only for jointly 
Gaussian sources and MSE distortion, it would be interesting to learn something about the performance 
of the proposed quantizers for sources with non-Gaussian statistics and/or other distortion measures. An 
interesting result of Zamir states that, although the gap between Rx{d) and Rx\Y{d) can be unbound, 
the gap between the Wyner-Ziv rate/distortion function R*x{d) and Rx\Y{d) is bounded, and actually 
quite small in some cases: 0.5 bits/sample for arbitrary source statistics and MSE distortion, and 0.22 
bits/sample for a binary source with Hamming distortion [59]. In our opinion this is an interesting issue 
because, should a result similar to Zamir's hold for the performance of our codes, this would immediately 
allow us to conclude that arbitrarily accurate estimation is possible not just for jointly Gaussian sources, 
but for any source statistics. And even if we do not have a formal proof, it certainly seems plausible to 
us that this may be so. 

Concerning the type of asymptotics developed in this work. Tools employed for theoretical performance 
analysis in source coding problems can be roughly classified into two main groups: 

• Large-block asymptotics, as pioneered by Shannon [43]. 

• High-rate asymptotics, as pioneered by Zador, Gersho and others [18], [58]. 

The asymptotics we considered in this work are of neither type - instead, we focused on high-correlation 
asymptotics. And we believe this type of analysis is one particularly well suited for a new class of source 
coding problems, that originate in the context of sensor networks. This paper presents one such analysis 
for a simple toy problem involving a Brownian process. More of our work along these lines can be found 
in [28], [38], [41]. 

To conclude, we would like to comment on the nature of our contributions in this paper. Since the 
seminal work of Gupta and Kumar [22], most of the theory work on wireless networks appears to have 
been driven by a desire to find ways to understand, and if possible circumvent, the fact that the per-node 
throughput of the network vanishes as the number of nodes grows. Implicit in previous work seems 
to have been present an assumption that each node has a constant amount of information to transmit, 
irrespective of the network size: in this case, the fact that the throughput per node decreases as the 
network size increases does indeed pose serious problems. However, we feel the asymptotic analysis 
of [22] is better suited to "networks of small sensors" than to "networks of laptop computers": whereas 
there are only so many laptops that one may want to have in a single room, much higher densities of 
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small sensing nodes are conceivable. Yet it is very high densities of nodes what the asymptotic analysis 
of [22] suggests to us. Now, in the context of sensor networks, the vanishing-throughput property of some 
wireless networks is much less of a problem. As an application for our codes with side information, we 
illustrated an instance of a class of wireless networking problems in which, as the size of the network 
grows, the amount of information generated by each transmitter decays at the same speed as the per-node 
throughput does. Hence, contrary to the conclusions suggested in [22], designers of these networks should 
be encouraged to consider very large numbers of nodes, for doing so may result in improved quality of 
the signals reconstructed at the receivers, and it may also make more economic sense. 
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Appendix 

A. Bounding (5 

Recall from Section ni-B, 

/3 = ^ E / ||x-(A + 7fc(x))||Vx|y(x|e)dx, 

AesK(A)\{0} •^xGy[A;«K{A)] 

for any ^ G y[0 : sA]. Our goal next is to give an estimate for f5. 

Since each term of the sum is positive, we have a trivial lower bound: /3 > 0. As for an upper bound: 

P = E / l|x-(A + 7.(x))| ^^ e-^ll^-^^ll^dx 



< 



i y [ ||x|p . e-^ll^--^«ll^dx 

+ / ||A + 7fc(x)|p ^ e"^(^"^''"^^"'dx 

Jy[A:sK(A)] [27rfJ^(l - 2 

1 y 2||A||2 -i ^ e"^^r^""'' f / dx^ 

1 2K5k(A)) ^ IIM|2.-i^-^ll^ll 



"[2vr4(l-p2)]^ 



e 



AGsfc(A)\{0} 

21.MA)) ^ ||^^||2^__^^||,A|| 



"[27ra2.(l-p2)]t 

L A V AeK(A)\{0} 

1 2.(.K(A)).^ 1^ (13) 

where: 

(a) is just a substitution for the conditional Gaussian distribution; 

(b) follows from the fact that ||o - 6|p < ||a|p + 

(c) is because of two reasons: under the assumption that sublattice cells are small, we have ||x|p w 
(when X G V[X : sk(A)]); and under the further assumption that R is large, ||7fc|p is negligible 
compared to ||A|p (when A / 0), and « (when ^ £ V[0 : sA]); 

(d) follows from defining Nrn{K{A)) as the number of points in A G k(A) such that ||A|p = m.^ 

To find a useful estimate for this sum, we need to bound Nm{K{A)). One simple such bound is: 
, , . , , surface of an n-dimensional sphere of radius m 

NM^)) < -T——T—, — . N - 



volume of an (n — 1) -dimensional sphere of radius 



2 



Note: wlog, we can take norms to be integers. If this is not the case, we can always form a (countable) list of all the norms 
that appear in k(A), and take m to be an index in this list. 
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This bound follows from the fact that the highest density of lattice points on the surface of a sphere 
cannot be higher than if we assume a perfect tessellation of this (n— l)-dimensional surface into (n— 1)- 
dimensional spheres whose radius is ^ of the smallest separation between sublattice points. Using standard 
formulas [11], we find that 

n— 1 

Py—[. Therefore, 



for appropriate constants 


Cn and dn, and e„ 


(a) 

/3 < 


^ 2z^(sK(A))enS^ 
'^[2vrai(l-p2)]- 








'^[2vr4(l-p2)]- 


(1) 


^ 2z^(sK(A))e„s^ 




-[27r4(l-p2)]- 


(c) 
< 


^ 2v{sK{K))enS^ 
-[2vr4(l-p2)]t 


(d) 


1 2z/(sK(A))e„s2 




"[2vrai(l-p2)]t 


< 


e 



- d4f ) 

oo 
oo 



;y-m+(n-l) log(m) 

m=l 



(,^-l)log(^.) 




(14) 



where: 

(a) follows from replacing the estimate for Nm{K{K)) in eqn. (13); 

(b) follows from simple manipulations, and defining = Q; 

(c) follows from observing that l2&^ < f-.^_ , for close enough to 1; 

(d) follows from evaluation of the sum of a power series; 

(e) where this holds for all values of p such that po < |p| < 1> for a constant po that depends on e 



since, from (4), we have s/ (crx\/l — p^) — > oo, thus convergence is exponential in p. 
Thus, < /? < e, for all e > and all |p| close enough to 1. Hence, eqn. (14) defines an asymptotically 
good estimate of (5. 



August 31, 2006. 



DRAFT 



35 



References 

[1] A. Aaron and B. Girod. Compression with Side Information Using Turbo Codes. In Proc. IEEE Data Compression Conf. 
(DCC), Snowbird, UT, 2002. 

[2] M. Baake and R. V. Moody. Similarity Submodules and Semigroups. In J. Patera, editor, Quasicrystals and Discrete 
Geometry, pages 1-13. Comm. Fields Institute, American Mathematical Society, Providence, RI, 1998. 

[3] R. Barron, B. Chen, and G. W. Wornell. The Duality Between Information Embedding and Source Coding with Side 
Information and Some Applications. IEEE Trans. Inform. Theory, 49(5): 1159-1 180, 2003. 

[4] J. Barros and S. D. Servetto. Network Information Flow with Correlated Sources. IEEE Trans. Inform. Theory, 52(1): 155- 
170, 2006. 

[5] T. Berger. The Information Theory Approach to Communications (G. Longo, ed.), chapter Multitertninal Source Coding. 
Springer- Verlag, 1978. 

[6] T. Berger, Z. Zhang, and H. Viswanathan. The CEO Problem. IEEE Trans. Inform. Theory, 42(3):887-902, 1996. 
[7] M. Bernstein, N. J. A. Sloane, and P. E. Wright. On Sublattices of the Hexagonal Lattice. Discrete Math., 170:29-39, 
1997. 

[8] N. Bourbaki. Elements de Mathematiques. Hermann, 1958. Livre II (Algebre), Chapitre I (Structures Algebriques). 
[9] M. Chiang and S. Boyd. Geometric Programming Duals of Channel Capacity and Rate Distortion. IEEE Trans. Inform. 
Theory, 50(2):245-258, 2004. 

[10] J. H. Conway, E. M. Rains, and N. J. A. Sloane. On the Existence of Similar Sublattices. Canad. J. Math., 51:1300-1306, 
1999. 

[II] J. H. Conway and N. J. A. Sloane. Sphere Packings, Lattices and Groups. Springer Verlag, 3rd edition, 1998. 
[12] M. H. M. Costa. Writing on Dirty Paper. IEEE Trans. Inform. Theory, IT-29(3):439^41, 1983. 

[13] T. M. Cover. A Proof of the Data Compression Theorem of Slepian and Wolf for Ergodic Sources. IEEE Trans. Inform. 

Theory, IT-2I(2):226-228, 1975. 
[14] T. M. Cover and M. Chiang. Duality Between Channel Capacity and Rate Distortion with Two-Sided State Information. 

IEEE Trans. Inform. Theory, 48(6): 1629-1638, 2002. 
[15] T. M. Cover and J. Thomas. Elements of Information Theory. John Wiley and Sons, Inc., 1991. 

[16] Z. Cvetkovic and M. Vetterli. Error-Rate Characteristics of Oversampled Analog-to-Digital Conversion. IEEE Trans. 
Inform. Theory, 44(5): 1961-1964, 1998. 

[17] J.-J. Fuchs and B. Delyon. Minimal Li-Norm Reconstruction Function for Oversampled Signals: Applications to Time- 
Delay Estimation. IEEE Trans. Inform. Theory, 46(4): 1666-1673, 2000. 

[18] A. Gersho. Asymptotically Optimal Block Quantization. IEEE Trans. Inform. Theory, IT-25(4):373-380, 1979. 

[19] V. K. Goyal, M. Vetterli, and N. T. Thao. Quantized Overcomplete Expansions in R^: Analysis, Synthesis, and Algorithms. 
IEEE Trans. Inform. Theory, 44(1):I6-3I, 1998. 

[20] R. M. Gray and D. L. Neuhoff. Quantization. IEEE Trans. Inform. Theory, 44(6):2325-2383, 1998. 

[21] M. Grossglauser and D. Tse. Mobility Increases the Capacity of AdHoc Wireless Networks. IEEE Trans. Networking, 
10(4):477^86, 2002. 

[22] P Gupta and P R. Kumar. The Capacity of Wireless Networks. IEEE Tram. Inform. Theory, 46(2):388^04, 2000. 
[23] P. Gupta and P. R. Kumar. Towards an Information Theory of Large Networks: An Achievable Rate Region. IEEE Trans. 
Inform. Theory, 49(8): 1 877- 1894, 2003. 



August 31, 2006. 



DRAFT 



36 



[24] C. Heegard and T. Berger. Rate Distortion when Side Information May Be Absent. IEEE Trans. Inform. Theory, IT- 
31(6):727-734, 1985. 

[25] A. H. Kaspi and T. Berger. Rate-Distortion for Correlated Sources with Partially Separated Encoders. IEEE Trans. Inform. 

Theory, IT-28(6): 828-840, 1982. 
[26] H. Krim, D. Tucker, S. Mallat, and D. Donoho. On Denoising and Best Signal Representation. IEEE Trans. Inform. 

Theory, 45(7):2225-2238, 1999. 
[27] S. R. Kulkarni and R Viswanath. A Deterministic Approach to Throughput Scaling in Wireless Networks. IEEE Trans. 

Inform. Theory, 50(6): 1041-1049, 2004. 
[28] G. N. Lilis, M. Zhao, and S. D. Servetto. Distributed Sensing and Actuation on Wave Fields. In Proc. 2nd Sensor and 

Actor Networks Protocols and Applications (SANPA), Boston, MA, 2004. 
[29] Z. Liu, S. Cheng, A. Liveris, and Z. Xiong. Slepian-Wolf Coded Nested Quantization (SWC-NQ) for Wyner-Ziv Coding: 

Performance Analysis and Code Design. In Proc. IEEE Data Compression Conf. (DCC), Snowbird, UT, 2004. 
[30] N. Merhav and S. Shamai. On Joint Source-Channel Coding for the Wyner-Ziv Source and the Gel'fand-Pinsker Channel. 

IEEE Trans. Inform. Theory, 49(ll):2844-2855, 2003. 
[31] P. Mitran and J. Bajcsy. Coding for the Wyner-Ziv Problem with Turbo-Like Codes. In Proc. IEEE Int. Symp. Inform. 

Theory, Lausanne, Switzerland, 2002. 
[32] C. Peraki and S. D. Servetto. On the Maximum Stable Throughput Problem in Random Networks with Directional 

Antennas. In Proc. ACM MobiHoc, Annapolis, MD, 2003. 
[33] C. Peraki and S. D. Servetto. Capacity, Stability and Flows in Large-Scale Random Networks. In Proc. IEEE Inform. 

Theory Workshop (ITW), San Antonio, TX, 2004. 
[34] S. S. Pradhan, J. Chou, and K. Ramchandran. Duality Between Source Coding and Channel Coding and its Extension to 

the Side Information Case. IEEE Trans. Inform. Theory, 49(5):1181-1203, 2003. 
[35] S. S. Pradhan and K. Ramchandran. Distributed Source Coding Using Syndromes (DISCUS): Design and Construction. 

In Proc. IEEE Data Compression Conf (DCC), Snowbird, UT, 1999. 
[36] S. S. Pradhan and K. Ramchandran. Distributed Source Coding: Symmetric Rates and Applications to Sensor Networks. 

In Proc. IEEE Data Compression Conf (DCC), Snowbird, UT, 2000. 
[37] D. ReboUo-Monedero, R. Th&n^, and B. Girod. Design of Optimal Quantizers for Distributed Source Coding. In Proc. 

IEEE Data Compression Conf. (DCC), Snowbird, UT, 2003. 
[38] A. Scaglione and S. D. Servetto. On the Interdependence of Routing and Data Compression in Multi-Hop Sensor Networks. 

Wireless Networks, 11(1-2): 149-160, 2005. Special issue with selected (and revised) papers from ACM MobiCom 2002. 
[39] S. D. Servetto. Lattice Quantization with Side Information. In Proc. IEEE Data Compression Conf. (DCC), Snowbird, 

UT, 2000. 

[40] S. D. Servetto. On the Feasibility of Large-Scale Wireless Sensor Networks. In Proc. 40th Allerton Conf. on Communication, 

Control and Computing, Urbana, IL, 2002. 
[41] S. D. Servetto and J. M. Rosenblatt. The Multiterminal Source Coding Problem for Spatial Waves. In Proc. UCSD Wkshp. 

Inform. Theoiy App., San Diego, CA, 2006. Invited paper. 
[42] S. Shamai, S. Verdu, and R. Zamir. Systematic Lossy Source/Channel Coding. IEEE Trans. Inform. Theory, 44(2):564-579, 

1998. 

[43] C. E. Shannon. Coding Theorems for a Discrete Source with a Fidelity Criterion. IRE Nat. Conv. Rec, 4:142-163, 1959. 



August 31, 2006. 



DRAFT 



37 



[44] D. Slepian and J. K. Wolf. Noiseless Coding of Correlated Information Sources. IEEE Trans. Inform. Theory, IT-19(4):471- 
480, 1973. 

[45] H. Stark and J. Woods. Probability, Random Processes, and Estimation Theory for Engineers (2nd ed.). Prentice Hall, 
1994. 

[46] J. K. Su, J. J. Eggers, and B. Girod. Channel Coding and Rate Distortion with Side Information: Geometric Interpretation 

and Illustration of Duality. Submitted to the IEEE Trans. Inform. Theory. 
[47] N. T. Thao and M. Vetterli. Reduction of the MSE in i?-times Oversampled A/D Conversion from 0{1/R) to 0{1/R^). 

IEEE Trans. Signal Processing, 42(l):200-203, 1994. 
[48] T. Tian, J. Garcia-Frias, and W. Zhong. Compression of Correlated Sources using LDPC Codes. In Proc. IEEE Data 

Compression Conf. (DCC), Snowbird, UT, 2003. 
[49] S. Toumpis and A. J. Goldsmith. Capacity Regions for Wireless Adhoc Networks. IEEE Trans. Wireless Comm., 2(4);736- 

748, 2003. 

[50] S. Y. Tung. Midtiterminal Source Coding. PhD thesis, Cornell University, 1978. 

[51] V. A. Vaishampayan, N. J. A. Sloane, and S. D. Servetto. Multiple Description Vector Quantization with Lattice Codebooks: 

Design and Analysis. IEEE Trans. Inform. Theory, 47(5):I7I8-1734, 2001. 
[52] S. Verdu. Spectral Efficiency in the Wideband Regime. IEEE Trans. Inform. Theory, 48(6):1319-I343, 2002. 
[53] H. Viswanathan and T. Berger. The Quadratic-Gaussian CEO Problem. IEEE Trans. Inform. Theory, 43(5): 1549-1559, 

1997. 

[54] A. D. Wyner. On Source Coding with Side Information at the Decoder. IEEE Trans. Inform. Theory, IT-2I(3):294— 300, 
1975. 

[55] A. D. Wyner. The Rate-Distortion Function for Source Coding with Side Information at the Decoder-II: General Sources. 
Inform. Contr., 38:60-80, 1978. 

[56] A. D. Wyner and J. Ziv. The Rate-Distortion Function for Source Coding with Side Information at the Decoder. IEEE 

Trans. Inform. Theory, IT-22(I):1-10, 1976. 
[57] L.-L. Xie and P. R. Kumar. A Network Information Theory for Wireless Communication: Scaling Laws and Optimal 

Operation. IEEE Trams. Inform. Theory, 50(5):748-767, 2004. 
[58] P. Zador. Asymptotic Quantization Error of Continuous Signals and the Quantization Dimension. IEEE Trans. Inform. 

Theory, IT-28(2): 139-149, 1982. 
[59] R. Zamir. The Rate Loss in the Wyner-Ziv Problem. IEEE Trans. Inform. Theory, 42(6):2073-2084, 1996. 
[60] R. Zamir and S. Shamai. Nested Linear/Lattice Codes for Wyner-Ziv Encoding. In Proc. IEEE Inform. Theory Workshop, 

Killarney, Ireland, 1998. 

[61] R. Zamir, S. Shamai, and U. Erez. Nested Linear/Lattice Codes for Structured Multiterminal Binning. IEEE Trans. Inform. 

Theory, 48(6): 1250-1276, 2002. 
[62] Q. Zhao and M. Effros. Optimal Code Design for Lossless and Near Lossless Source Coding in Multiple Access Networks. 

In Proc. IEEE Data Compression Conf. (DCC), Snowbird, UT, 2001. 



August 31, 2006. 



DRAFT 



38 



PLACE 
PHOTO 
HERE 



Sergio D. Servetto was born in Argentina, on January 18, 1968. He received a Licenciatura en Informatica 
from Universidad Nacional de La Plata (UNLP, Argentina) in 1992, and the M.Sc. degree in Electrical En- 
gineering and the Ph.D. degree in Computer Science from the University of Illinois at Urbana-Champaign 
(UIUC), in 1996 and 1999. Between 1999 and 2001, he worked at the Ecole Polytechnique Federale 
de Lausanne (EPFL), Lausanne, Switzerland. Since Fall 2001, he has been an Assistant Professor in the 
School of Electrical and Computer Engineering at Cornell University, and a member of the fields of 
Applied Mathematics and Computer Science. He was the recipient of the 1998 Ray Ozzie Fellowship, given to "outstanding 
graduate students in Computer Science," and of the 1999 David J. Kuck Outstanding Thesis Award, for the best doctoral 
dissertation of the year, both from the Dept. of Computer Science at UIUC. He was also the recipient of a 2003 NSF CAREER 
Award. His research interests are centered around information theoretic aspects of networked systems, with a current emphasis 
on problems that arise in the context of large-scale sensor networks. 



August 31, 2006. 



DRAFT 



